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Abstract. - We propose a stochastic Gillespie scheme to describe the temporal fluctuations 
of local denaturation zones in double-stranded DNA as a single molecule time series. It is 
demonstrated that the model recovers the equilibrium properties. We also study measurable 
dynamical quantities such as the bubble size autocorrelation function. This efficient compu- 
tational approach will be useful to analyse in detail recent single molecule experiments on 
clamped homopolymer breathing domains, to probe the parameter values of the underlying 
Poland-Scheraga model, as well as to design experimental conditions for similar setups. 



Introduction. - Under a large range of salt conditions and temperatures, the double-helix 
is the thermodynamically stable configuration of DNA [1,2]. This stability is effected by the 
Watson-Crick H-bonding of base-pairs (bps), and the stronger base-stacking of neighbouring, 
planar aromatic bps, that by hydrophobic interactions stabilize the helical structure [3,4]. At 
the same time, double-stranded DNA (dsDNA) is distinguished by the ease with which locally 
the molecule can open up (and later rejoin), to produce flexible bubbles of single-stranded DNA 
(ssDNA), see figure^, [1,2]. The formation of DNA-bubbles, despite the rather large enthalpy 
necessary to break the base- stacking, is made possible by the entropy gain, due to which the 
free energy for breaking additional bps is of the order of fc^T, after overcoming a bubble 
initiation barrier <jq — 1CP 5 . . . 1CP 3 [5]. At room or physiological temperature, bubbles of 20 
to 30 broken bps are created [2] . DNA-bubbles preferentially form in regions rich in the weaker 
AT bps [5] , and they are related to physiological processes such as transcription initiation [6] . 

Traditionally measured in bulk by UV-absorption at elevated temperatures [7], it is now 
possible to probe the time series of the size fluctuations of a single DNA-bubble (DNA- 
breathing) by fluorescence correlation techniques in short, designed DNA-stretches [8]. These 
DNA-constructs contain a well-defined (AT)m breathing domain, that is labelled by a fluoro- 
phorc-qucnchcr pair. Such a model system allows for a precise quantitative analysis, from 
which the parameters of the underlying theoretical model (usually the Poland-Scheraga model) 
can be determined. This fluorescence technique is being developed further to measure longer 
bubble domains, also at higher temperatures. It is therefore of interest to provide a theo- 
retical model to understand DNA-breathing in a homopolymer domain quantitatively. One 
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Fig. 1 - (a) A stretch of dsDNA that is clamped at both ends, with an ssDNA bubble consisting of 
m broken bps. (b) Schematic representation of the jump process underlying bubble-breathing. 



possible approach is based on the master equation for the probability distribution to find 
a certain bubble size at time t, from which quantities such as the mean bubble size or the 
bubble autocorrelation function can be derived [9-11]. Despite its mathematically appealing 
formulation, the master equation needs to be solved numerically by inverting the transfer ma- 
trix [9,11]. Moreover, it produces ensemble-averaged information. Given the access to single 
molecule data, it is of relevance to obtain a model for the fully stochastic time evolution of 
a single DNA-bubble, providing a description for pre-noise-averaged quantities such as the 
step-wise (un)zipping. With this scope, we here introduce a stochastic simulation scheme for 
the (un)zipping dynamics. We use the Gillespie algorithm to update the state of the system 
by determining (i) the random time between individual (un) zipping events, and (ii) which 
reaction direction (zipping, <— , or unzipping, — >) will occur. We corroborate that the model 
recovers the equilibrium properties, and study the single bubble time evolution. The proposed 
scheme is efficient computationally, easy to implement, and amenable to generalization. 

The Model. Motivated by the in vitro study of reference [8], we consider a dsDNA 
segment with M bps that is clamped at both ends (figure ^i). A bubble state of to broken 
bps is defined by the occupation numbers b m = 1 and b m i — (to/ ^ to). The stochastic 
simulation then corresponds to the nearest-neighbour jump process (figure \T]p) 



bo h 



(1) 



with reflecting boundary conditions at bo and 6m- Each jump away from state b m occurs after 
a random time t, and in random direction to either b m -x or 6 m +i, governed by the reaction 
probability density function (PDF)( 1 ) [12] 



P(t,h) = t^(TO)e-( t+(m)+t ~ (m) ) r , 



(2) 



where \i G {+, — } denotes the unzipping (+) or zipping (— ) of a bp, and the jump rates t ± (m.) 
are defined below. From the joint PDF (J2J, the waiting time PDF that a jump away from b m 
occurs is given by — P(t, /i), i.e., it is Poissonian. The probability that the bubble 

size does not change in the time interval [0, t] is given by <p(t) = 1 — J Q ip(r)dT. 



The rates t (to) are based on the statistical weight for a homopolymer bubble [2, 13] 

jF^(to) =cr u m (l + TO)- c 



(3) 



according to the Poland-Scheraga model, with 2f v (0) = 1. Here, we introduced the loop 
initiation factor ctq for breaking the initially unperturbed dsDNA; the statistical weight 



( 1 )The original expression P(r,fl) = imffrnjexp y— T J2m,fi ' > mt M (m)j 
multi-bubble states, simplifies here due to the particular choice of the b„ 



that is relevant for consideration of 
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u = cxp(—E/ksT) with the free energy E = E(T) associated with breaking an additional bp. 
For the (AT) domain (with melting temperature 67° C), u — 0.6 corresponds to physiological 
temperature, and u — 0.9 to » 61°C. We note that in single molecule stretching experi- 
ments the effective temperature can be changed by applying an external torque 2, such that 
u — > uexp(9o1/kBT), where 8q = 27r/10.35 is the twist angle per bp of the dsDNA [14,15]. 
Equation also introduces the loop closure factor (1 + m)~ c assigned to the entropy loss for 
creating a polymer loop of size m; the offset by 1 is due to persistence length corrections [5,16]. 
For the loop closure exponent in the rather short DNA-segment we have in mind here, we 
choose c = 1.76 [5, 17, 18], although higher values have been suggested [13]. Requiring that 
the system eventually reaches equilibrium, the rates (m) have to fulfil the detailed balance 
condition t + (m — l)iF"(m — 1) = t~ ~(m)3f~(m). This condition does not fully specify the 
rates, and we choose the specific form [9, 11] 

t+(0) = 2- c k* u, t+(m) = fc ir ^+ ) 1) = ku , t~(m) = k, (4) 

completed by t~(0) = t + (M) = 0. This choice assumes that the unzipping of a bp is limited by 
the Boltzmann factor u {(Jqu for bubble initiation), whereas bp-zipping occurs with constant 
rate k, specified by the sterical formation of a bp (to be determined from microscopic models). 

Results and discussions. - We start the simulations from the completely zipped state, 
bo = 1 at t = 0, and measure the bubble size at time t in terms of m(t) = J2m=o m ^m(0- The 
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Fig. 3 - Running average of the bubble size m(t) for two realizations (rugged lines) . The straight line 
represents the ensemble average (m) « 4.05 • 10~ 4 for <7q = 10~ 3 , u = 0.6, and M = 20. 



time series of m(t) for a single stochastic realization is shown in figureEl It is distinct that the 
bubble events are very sharp (note the time windows of the zoom-ins) , and most of the time 
the zero-bubble state bo prevails due to do C 1. Moreover, raising the temperature increases 
the bubble size and lifetime, as it should. By construction of the simulation procedure, it is 
guaranteed that an occupation number b rn = 1 (m ^ 0) corresponds to exactly one bubble. 
To study the ergodicity of the stochastic simulation, we compare the running average 

N—l 

m(t) = i -1 AUm(ti) : At l<N _ 1 = t l+1 - U, At N _i = t- tx-i, t <t Nl t — (5) 
i=o 

of the bubble size with the ensemble average, (m) = [cq/jVi) Ylm=i mu m (l + to)~ c , where 
jY\ = 1+ctq 53m=i um (l+ TO )~ c - Note that in equation (JSJ, we sample over the stochastic time 
steps ti chosen from the reaction PDF J5J, such that we need to weight the individual m(ti) by 
the time span Ati until the next jump to m{t i+ i). For long times, the quantity J5J is expected 
to reach the equilibrium value given by the ensemble-average, i.e., Woo = limt_>oo m(t) = (to). 
In figure 01 we display the time evolution of m(t) for two different stochastic trajectories. 
Both approach the ensemble-average (to) for longer times. Relatively large deviations from 
(to) occur, corresponding to a lumping of small or large bubble states. 

In figure we show the bubble size averaged exclusively over time steps Ati during which 
m{ti) ^ 0, as a function of the statistical weight u. As expected, the bubble size increases 
with increasing temperature, and the incorporation of the loop correction term distinctly 
reduces the maximum bubble size at equilibrium. The small error bars for the stochastic 
simulation data and the good agreement with the theoretical prediction X)m=i mum (l + 
m )~ c / E m =i u m {l + m)~ c demonstrate the reliability of the Gillespie algorithm. 

Figure depicts the time-averaged bubble size distribution 

JV-l 

P[m,t)=t- 1 J2 b m(ti)At i , (6) 

i=0 

for runs over a large number of jumps N (see below) to ensure that equilibrium is reached: 
limt^oo P(m, t) = P cq (m). P(m,t) is compared to the ensemble-averaged bubble distribu- 
tion, P cq (m > 1) = (l/^)<7 M ro (l + to)- c , with JT 2 = 1 + <roE™=i^ m (l + ™)~ C and 



S. K. BANIK et al: STOCHASTIC APPROACH TO DNA BREATHING DYNAMICS 



5 




Fig. 4 - Mean bubble size measured exclusively over open bubble states, as a function of statistical 
weight u with c = (left) and c = 1.76 (right). The boxes represent the time average from the 
simulations, the full line is the ensemble-average. The parameters are oo = 10~ 3 and M = 20. 



-foq(O) = (1/^2). This analysis demonstrates that the dsDNA segment almost always re- 
mains completely zipped (since 00 Cl), note the logarithmic ordinate. At room temperature 
the formation of bigger bubbles is a rare event. Conversely, near the melting temperature 
bubble formation is significantly increased (see also figure Eb). Note that in figures 0] and 
we chose the ordinates such that they span the same range for the simulated parameter values, 
to facilitate easy comparison. Again, small error bars and good agreement of the time-average 
with the theoretical ensemble-average are distinct. The Gillespie algorithm governing our 
stochastic simulation therefore reliably leads to relaxation towards equilibrium, as it should, 
given that the detailed balance condition is fulfilled by the transfer rate coefficients t ± (m). 
We determine the time-averaged autocorrelation function through the discretized form 



/ 1 N 

\ n=0 



nT hin )m(nT h - m ) - m^J / ^ m 2 (nT bi „) - m^J , (7) 
Here, N is the number of sample points taken over the trajectory, with sampling time incre- 
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Fig. 5 - Equilibrium bubble size distribution for u = 0.6 (left) and u = 0.9 (right) (<r = 10" 3 ). The 
boxes represent the simulations result (P(m, £)), the full line is the theoretical prediction (P oq (m)). 
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Fig. 6 - Time- averaged (C(t)) and ensemble-averaged (C eq (t), see [9,11]) autocorrelation functions 
versus correlation time for <tq = 10~ 3 , u = 0.6, c = 1.76, and M = 20. 



merits Tbi n = S/k. We chose 5 ~ 10~ 4 . Figure displays the bubble size autocorrelation C(t) 
as obtained from the stochastic simulation in comparison to the ensemble-averaged correlation 
C eq (t) = ({m(f)m(0)) — (m) 2 ) / ((m 2 ) — (m) 2 ); the latter is obtained by numerically solving 
the eigenvalue provlem associated with the master equation, see references [9,11]). Both show 
good agreement. From C(t), we conclude that at u — 0.6, the typical bubble lifetime should 
be of the order of 20/fc. Given the typical experimental bubble lifetime ~ 1 msec at room 
temperature, we extract that the zipping rate k ~ 50/isec, consistent with reference [8]. 

Implementation. - To determine the random waiting time r and the random reaction 
channel [i in the jump process controlled by the reaction PDF P(t, /i), two independent 
random numbers r*i and r-i (r^ 6 [0, 1]) are generated. In the "direct method", one conditions 
the reaction PDF in the form P(r, /i) = ?/>(t)P c (/z|t), where P c (fi\r) is the probability that 
the next reaction is a fi jump, given that the next reaction occurs at time t + r [12]. The 
waiting time r can then be obtained through the relation [12] 

r = r(m) = (l/ [t+(m) +t~(m)]) ln(l/n). (8) 

From ?"2, the direction of the jump is determined as /i = — , if < r2 [t + (m) + t~ (rn)] < t _ (m) 
holds; otherwise, \i = + [12]. A typical run would sample over 10 10 to 10 12 jumps to produce 
a single time series trajectory, where, in general, lower temperature requires longer runs. To 
ensure that equilibrium was reached in such a run, we checked that the quantity of interest 
reached (almost) constant values. For the correlation function, about 10 jumps were sampled 
at increments of 10 _4 /fc. For the time-averages, we calculated the mean over 100 trajectories. 

Conclusions. - By stochastic simulation based on the Gillespie algorithm, we obtained 
the time series of the breathing fluctuations of a single homopolymer DNA bubble, inspired 
by well-defined DNA-constructs employed in recent single molecule fluorescence correlation 
experiments. The waiting time for a jump event corresponding to a single zipping or unzipping 
event defined by the reaction PDF is Poissonian; however, due to the Boltzmann factors in 
the rates, the mean waiting time for a jump to occur can become quite long. 

To corroborate that the stochastic simulation properly describes thermalization, we de- 
termined equilibrium properties such as the mean bubble size and its distribution from time- 
averaging, to find almost perfect agreement with the calculated ensemble-averaged values. 
We also showed that the bubble size autocorrelation function and the running average of the 



S. K. BANIK et al: STOCHASTIC APPROACH TO DNA BREATHING DYNAMICS 



7 



bubble size follow the behaviour predicted by the master equation. The stochastic scheme 
is therefore a reliable way to describe the bubble dynamics. Computationally, the scheme is 
quite efficient, and it can be implemented in a straightforward manner. 

The major advantage of the stochastic approach, similar to the Langevin picture in dif- 
fusion processes, is the possibility to sample single stochastic trajectories and thereby study 
the direct effect of changes in the physical parameters (initiation factor ao, statistical weight 
u, loop closure factor c) on the single bubble time series. This is particularly relevant as it is 
possible, with the current experimental means, to record single bubble time series. Given the 
good convergence of the statistical sampling, comparison of dynamical data as obtained from 
our algorithm to experimental or precise microscopic simulations data will be an important 
way to explore further the validity of the Poland-Scheraga approach to DNA-brcathing, in 
particular, the exact value of the loop closure exponent c and the bubble initiation factor er , 
as well as potential corrections in the Poland-Scheraga model for small bubble domains. 

Further advantages are the relatively straightforward possibility to include types of waiting 
times that arc different from the Poissonian, for instance long-tailed ones [19]. The stochastic 
simulation is also more flexible to easily incorporate additional features such as the coupling 
to the binding dynamics of selectively single-stranded binding proteins to the fluctuating 
bubbles, sequence heterogeneity, or multiple bubbles, due to the formulation of an arbitrary 
number of coupled reactions in the Gillespie algorithm. The corresponding master equation 
approach in such cases quickly becomes intractable, once the number of variables exceeds two. 
It may therefore be desirable to have available a stochastic simulation package designed for 
DNA-breathing similar to the program MELTSIM for DNA-mclting simulations [5]. 

* * * 

We acknowledge very helpful discussions with Andreas Isacsson, and thank Petter Urkedal 
for assistance with the Intel Fortran compiler. 

REFERENCES 

[I] Kornberg, A., DNA Synthesis (W. H. Freeman, San Francisco, CA) 1974. 

[2] Poland, D. and Scheraga, H.A., Theory of helix-coil transitions in biopolymers (Academic 
Press, New York) 1970. 

[3] Watson, J. D. and Crick, F. H. C, Cold Spring Harbor Symp. Quant. Biol, 18 (1953) 123. 
[4] Delcourt, S. G. and Blake, R. D., J. Biol. Chem., 266 (1991) 15160. 
[5] Blake, R. D. et al, Bioinformatics, 15 (1999) 370. 

[6] Kornberg, A. and Baker, T. A., DNA Replication (W. H. Freeman, New York, NY) 1992. 
[7] Wartell, R. M. and Benight, A. S., Phys. Rep., 126 (1985) 67. 

[8] Altan-Bonnet, G., Libchaber, A. and Krichevsky, O., Phys. Rev. Lett., 90 (2003) 138101 
[9] Ambjornsson, T. and Metzler, R., E-print, (q-bio.BM/0411053) . 
[10] Hanke, A. and Metzler, R., J. Phys. A, 36 (2003) L473. 

[II] Ambjornsson, T. and Metzler, R., J. Phys. Cond. Mat, 17 (2005) S1841. 

[12] Gillespie, D.T., J. Comp. Phys., 22 (1976) 403; J. Phys. Chem., 81 (1977) 2340. 

[13] Richard, C. and Guttmann, A. J., J. Stat. Phys., 115 (2004) 925. 

[14] COCCO, S., MONASSON, R. and Marko, J. F., Proc. Natl. Acad. Set., 98 (2001) 8608. 

[15] Pant, K., Karpel, R. L. and Williams, M. C, J. Mol. Biol, 327 (2003) 571. 

[16] Fixman, M. and Freire, J. J., Biopolymers, 16 (1977) 2693. 

[17] Fisher, M. E., J. Chem. Phys., 38 (1966) 112. 

[18] Hanke, A. and Metzler, R., Phys. Rev. Lett., 90 (2003) 159801. 

[19] Metzler, R and Klafter, J., J. Phys. A, 37 (2004) R161. 



